Structurally-constrained optical-flow-guided adversarial generation of synthetic CT for MR-only radiotherapy treatment planning

The rapid progress in image-to-image translation methods using deep neural networks has led to advancements in the generation of synthetic CT (sCT) in MR-only radiotherapy workflow. Replacement of CT with MR reduces unnecessary radiation exposure, financial cost and enables more accurate delineation of organs at risk. Previous generative adversarial networks (GANs) have been oriented towards MR to sCT generation. In this work, we have implemented multiple augmented cycle consistent GANs. The augmentation involves structural information constraint (StructCGAN), optical flow consistency constraint (FlowCGAN) and the combination of both the conditions (SFCGAN). The networks were trained and tested on a publicly available Gold Atlas project dataset, consisting of T2-weighted MR and CT volumes of 19 subjects from 3 different sites. The network was tested on 8 volumes acquired from the third site with a different scanner to assess the generalizability of the network on multicenter data. The results indicate that all the networks are robust to scanner variations. The best model, SFCGAN achieved an average ME of 0.9 5.9 HU, an average MAE of 40.4 4.7 HU and 57.2 1.4 dB PSNR outperforming previous research works. Moreover, the optical flow constraint between consecutive frames preserves the consistency across all views compared to 2D image-to-image translation methods. SFCGAN exploits the features of both StructCGAN and FlowCGAN by delivering structurally robust and 3D consistent sCT images. The research work serves as a benchmark for further research in MR-only radiotherapy.


Scientific Reports
| (2022) 12:14855 | https://doi.org/10.1038/s41598-022-18256-y www.nature.com/scientificreports/ images, which can be coarsely divided into 4 categories, (1) Atlas-based methods, (2) Bulk density methods, (3) Probabilistic based methods and (4) machine learning methods Largent et al. 16 . Hofmann et al. 13 was the first atlas-based approach for the computational conversion of MR to CT, achieving an MAE of 100.7 Hounsfield Units (HU). The bulk density method was initially introduced as a method for sCT generation. Anatomical contours from MR can be used to generate bulk anatomical density (BAD) maps Choi et al. 6 . Recent advancements in machine learning in the medical imaging field led to the development of sophisticated methods for synthetic image generation Ching et al. 5 . In 2013, Andreasen 2 proposed an approach to investigate the use of Random Forest regression (RaFR) and Gaussian Mixture Regression (GMR) for MR to CT translation. The introduction of deep learning techniques made it possible to perform image-to-image translation generalizable across datasets. In 2014, Goodfellow et al. 10 proposed a framework to estimate generative models using adversarial training of generative and discriminative networks. Since then, GANs have been consistently proved to be a promising path for achieving image-to-image translation tasks Alotaibi 1 . In Han 11 , a deep learning method was first introduced to generate sCT by training a generative model i.e. U-net to convert 2D MR to CT slice. They obtained an average MAE of 84.817.3 HU across test data. Maspero et al. 18 used phase, water and fat images as input to the GAN in pelvis data and obtained an average MAE of 61.09.0 HU on test data. In Wolterink et al. 29 , a generative model was trained on an unpaired brain dataset, and Nie et al. 19 employed a model on a paired dataset. Maspero et al. 18 has used cGAN to obtain prostrate sCT for dosimetric calculations. So far, to our best knowledge, no deep learning based method has incorporated optical flow information for sCT generation.
This manuscript proposes deep learning models for MR to CT image generation in MR-only radiotherapy workflows for male pelvis data using a publicly available paired pelvis dataset. Test images have been taken from a scanner unseen to the trained models. This allows us to widen the scope of possibilities in training and improves the generalization of the conversion. Dosimetric (MAE and ME) and PSNR have been evaluated to quantify and compare the overall accuracy of sCT results with previously existing methods. The purpose of generating sCT from MR is to provide the electron density information for facilitating dose calculation in the absence of ground truth CT. The sCT encodes tissue information in HU which can be easily transformed to electron density in the treatment planning system (TPS) using a calibration curve. However, the dose calculation step is not incorporated in our research work, as the focus was to evaluate and compare the results among the proposed novel GANs.

Materials and methods
Dataset. This study was conducted on a publicly available Gold Atlas project dataset Nyholm et al. 21 , which aimed to provide means for training and validation for segmentation and sCT algorithms. The data consists of male pelvic region T1-and T2-weighted MR and CT volumes of 19 patients acquired from 3 different sites. The dataset also includes consensus delineations of nine organs for each patient based on the T2-weighted MR images. The CT volumes were deformed to fit the anatomy of MR data, enabling the use of the organ delineations on the registered CT. Table 1 provides further details of the dataset acquisition parameters.

Data preparation.
To remove the background noise from the scans, all the voxels outside the body in MR and CT images were set to 0 and -1024, respectively. The pelvis axial slices were converted to 16-bit grayscale images after truncating the intensity range of the original images. Similar to Fetty et al. 9 , the upper limit to truncate the image was kept as 1400 for CT, as any values greater than 1400 were identified in rare cases, mainly in the femoral bones, which is not considered in dose planning. For MR volumes, the upper limit was restricted to www.nature.com/scientificreports/ 99 percentile of all the intensity values to exclude outliers, followed by min-max normalization for each volume to normalize the intensity variations in MR scans. Inter-scan differences (air pockets and structures) have not been taken into account in this study. The first and last two axial slices were removed from the corpus to avoid the aliasing effects in MR images. All the images were resized to (256, 256) dimension using bicubic interpolation. To facilitate the direct comparison of the results, we have used the split proposed by Boni et al. 3 , i.e. site 2 and site 3 acquisitions were used as the training set, whereas site 1 was used in the testing set. Such a split also tests the robustness of the network across scanners as the testing set contains data taken from a completely different scanner. In total, there are 1015 axial slices or 11 patient volumes in the training set and 659 axial slices or 8 volumes in the test set.
Networks. The study implements 4 different types of Generative Adversarial Networks (GANs) with a shared neural network architecture of generator and discriminator trained with different sets of loss functions, enabling the fair comparison of different loss functions to learn the given task. The first network used to obtain a baseline score in this study is the traditional CycleGAN Zhu et al. 30 . The other two networks add structural similarity (StructCGAN) and optical flow consistency (FlowCGAN) constraints during the optimization step of the generator in the GAN. The 4th network augments generator loss function with both structural similarity and optical flow constraints. All the networks were trained in a supervised fashion to establish a bidirectional mapping between CT and T2-weighted MR axial slices.
All the GANs implemented in this study have two generators and two discriminators. The generator, G CT translated the MR image to produce synthetic CT, i.e. sCT = G CT (MR) . Similarly, we have sMR = G MR (CT) . The two discriminators, D CT and D MR aim to distinguish between real and synthetic images. The overall pipeline common to all 4 GANs is shown in the fig. 1.
The architecture of both the generators is ResNet He et al. 12 with 9 residual blocks. The discriminator has 5 convolutional blocks, each followed by instance normalization and ReLu non-linearity except the last layer. All ReLUs are leaky with a slope of 0.2. The first convolutional layer kernel has 64 channels, each with a size of (4, 4), and a stride of 2. Subsequently, in layer 2nd, 3rd and 4th , the number of channels in the kernel keep doubling without any changes in other parameters. The final convolutional block maps its input to a single number between 0 and 1, denoting the probability of prediction as a synthetic or real image. (CycleGAN). Traditionally, CycleGAN training has two types of terms in loss functions. The first corresponds to adversarial losses to match the data distribution of synthetic images with ground truth images. The second is cycle consistency loss which enables the model to learn a cycleconsistent mapping. Similar to Mao et al. 17 , least-square has been used instead of negative log-likelihood objective function to improve training stability and generate higher quality images. The generator is updated by fixing the discriminator parameters and vice-versa.

Loss functions. Cycle-consistent GAN
The least-squares GAN loss function is provided in Eq. (1). Upon backpropagation, it minimizes the possibility of the translated image being recognized as a synthesized image by the discriminator.
(1) Overall pipeline for all the models. The green section shows CT to MR translation and recovery back to input CT using MR and CT generators, respectively. Similarly, the blue part corresponds to MR to synthetic CT conversion steps. If the input of the discriminator is a synthesized image, then ideally discriminator should output 0, i.e. fake image. This is denoted by the red colour arrow. www.nature.com/scientificreports/ The cycle loss function ensures the cyclic consistency between the input and generated images. The aim is to recover the input image when the synthesized image is reconstructed back to its original imaging modality. The loss function is as follows: The net generator loss function is provided in Eq. (3). Here, cycle is taken as 10.
The discriminator loss corresponding to CT images is provided in Eq. (4). The aim is to predict label 1 for a real imaging modality and label 0 for a synthetic image.
Similarly, the loss corresponding to D MR is calculated. The net discriminator loss is the average of the discriminator loss functions of individual imaging modalities, shown below.
The above net discriminator loss function is the same for all of our models. In contrast, the net generator loss function changes case-by-case and is explicitly defined further in the manuscript. where, µ CT and µ sCT represent the average of the 2D axial image slice of the reference and synthesized CT images, respectively; σ CT and σ sCT are the variance of the reference and synthesized CT image respectively; σ CT,sCT is the covariance of the reference and synthesized CT image; and C1 and C2 are two hyper-parameters used to prevent unstability, and were set as 0.0001 and 0.009 respectively, to stabilize the division with the weak denominator. Similarly, the SSIM index for MR images is also calculated. The net SSIM index is the average SSIM index of individual imaging modalities. The structural similarity loss function is constructed by subtracting the net SSIM index from 1, shown below.

Structurally-Similar CycleGAN (StructCGAN
The authors of the SSIM index asserts that it is more practical to apply SSIM locally rather than globally. Thus, the SSIM metric was applied regionally on windows of size (11,11), and the overall mean was taken across all the regions to backpropagate the loss function. The generator net loss is shown in Eq. (8), wherein the SSIM is taken as 2, and cycle is taken as 8. CycleGAN (FlowCGAN). To the best of our knowledge, all the previous work in MR to sCT generation using adversarial methods used only 2D slices. Though it works quite well for individual 2D slices, there is an unavoidable discontinuity in the combined 3D volume. To facilitate interframe image consistency, we enforced L1 loss between a warped slice and a ground truth slice while training. The warping is done both forward and backward from the current slice, i.e. slices adjacent to a given slice in an epoch to enforce dual-sided optical flow consistency.

Optical-Flow-Guided
The optical flow is calculated in advance using the Farenback method Farnebäck 8 between the ground truth pairs adjacent slices. Figure 2 shows the dense optical flow field image and the corresponding warping from the current slice to the next for original CT pairs. The Farenback method for optical flow calculation is prone to negligible error, and the same is reflected in the rightmost image of Fig. 2. The optical flow loss equation for CT is as follows: Similarly, the optical flow loss for MR is also calculated. Furthermore, the optical flow loss of both the imaging modalities is averaged to obtain the net optical flow loss. The net generator loss for FlowCGAN is given in Eq. (10). Here, the flow is set as 2 and cycle as 8.  CycleGAN (SFCGAN). The idea behind this network was to integrate the previous two networks so that images have high structural integrity and are coherent across slices. Therefore, both structural similarity and optical flow consistency loss were used together during the network training. The net generator loss equation is communicated in Eq. (11). The hyperparameters, SSIM and flow were taken as 2 each, and and cycle is taken as 6.
Objective function. For every epoch, the generator was backpropagated followed by the discriminator. Moreover, following the strategy by Shrivastava et al. 24 to reduce model oscillation, the discriminator was updated using the buffer of 50 previously generated images, rather than the ones produced by the latest generators. The objective function is expressed as follows: Training. All the networks were trained for 200 epochs with a batch size of 4. Adam optimizer Kingma and Ba 15 was used with a learning rate of 0.0002. The learning rate was kept constant for the first 100 epochs and subsequently decayed linearly to zero over the next 100 epochs. The networks were implemented on PyTorch Paszke et al. 22  Comparison metrics. Similar to other research work in this field Maspero et al. 18 , Fetty et al. 9 , we have estimated mean absolute error (MAE), mean error (ME) and peak signal-to-noise ratio (PSNR) between the prediction and ground truth volume inside the body contour. These parameters are calculated using these Equations:

Results
Since only the generator without backpropagation is required during the test phase, hardly 1.5GB of GPU RAM was utilized. CT synthesis took an average of 8 secs on our GPU for all the 8 test patients. Table 2 provides the comparison metrics scores of the whole volume for all the GANs implemented in our study. The results for pix2pixHD and pix2pix taken from Boni et al. 3 is also shown for comparison. The comparison is fair as they have also used the same dataset splits for training and testing purposes. SFCGAN outperforms all the networks with an average ME of 0.    www.nature.com/scientificreports/ All the comparison metrics were also calculated for 8 segmented organs, and the results are provided in Table 3. Comparing MAE scores, SFCGAN performs the best in the case of anal canal, penile bulb, prostate and seminal vesicles. FlowCGAN outperforms all the networks in both left and right femoral heads. For urinary bladder and rectum, pix2pixHD performs the best.
In Fig. 3, boxplots of ME, MAE and PSNR have shown. It can be inferred that the MAE of FlowCGAN has the minimum IQR (Inter Quartile Range) followed by SFCGAN with a slight difference of 0.52 HU. Nevertheless, SFCGAN outperforms FlowCGAN as the difference between their MAE score is 2.26 HU. It can be seen that our baseline model CycleGAN has minimum variability, followed by SFCGAN and FlowCGAN. PSNR has been computed to evaluate the quality of sCT. SFCGAN achieved the highest median of 57.33 dB, followed by FlowCGAN of 57.28 dB. Figure 4 shows example output images of CycleGAN and SFCGAN. The coronal cross-section in the figure is obtained by stacking elements from axial slices. The axial-coronal slice pairs are taken from different patients and correspond to different pelvis segments for diverse visual inspection.

Discussion
We inspected a new deep learning method for MR to sCT generation for dose calculation. Proposed methods consist of training 3 kinds of models (StructCGAN, FlowCGAN, SFCGAN) with CycleGAN as the base model trained with paired pelvis dataset. We evaluated ( Table 2) the synthetic CT images generated by comparing their intensities with original CT using MAE and ME. The results showed that SFCGAN achieved high compliance between the HU maps for both CT images by showing the lowest ME and MAE of -0.9 5.9 and 40.4 4.7 respectively. Also, PSNR has been used to evaluate the quality of reconstructed images and SFCGAN achieved the highest 57.2 1.4, which is superior to our baseline model CycleGAN. Figure 4 shows the output generated by SFCGAN. It can be seen that differences in the intensity values of soft tissues in original CT and sCT is minimal. Although sCT has brighter pixels at the edges compared to the CT, there is a considerable difference in intensity values at the bone and soft tissue boundaries. Additionally, incorporating optical flow information as an input enables our model to maintain the interframe consistency in sCT. The interframe consistency is reflected in the continuity of bones and tissues along the vertical axis in the coronal cross-section of SFCGAN results. Unsurprisingly, the same is absent in CycleGAN.
The study can be further improved by including a larger dataset. Unfortunately, to the best of our knowledge, there is no alternative publicly available paired pelvis dataset for research purposes. To minimize the limitation of the study posed by small dataset, the testing is performed on 8 pelvis volumes taken from a completely different site. The results are promising across unseen scanner data and indicate generalizability of the network across multiple scanners. These results show that our proposed model provides an accurate and reproducible sCT which can improve the robustness of MR-only radiotherapy workflow and reduce the clinical workload of performing a CT scan for the patients who are unable to undergo a CT scan due to radiation exposure.

Conclusion
In this work, we developed different types of CycleGAN models for generating synthetic medical images and compared the results against state-of-the-art results Boni et al. 3 . FlowCGAN can serve as an approximation to provide the accurate motion of pixel values in sCT images to maintain interframe consistency. StructCGAN generates CT images that are structurally robust and can provide better accuracy for diagnosing fractures. SFCGAN exploits the features of both StructCGAN and FlowCGAN by proving structurally robust and frame consistency in the sCT images. The model performed well on the dataset acquired from entirely different sites, showing the model's generalizability. The model serves as a benchmark for further research in MR-only radiotherapy and dose estimation in pelvis scans.

Data availability
The data that support the findings of this study are available from Tufve (record owner), but restrictions apply to the availability of these data, which were used under license for the current study. Data are however available from the owner upon reasonable request at https:// zenodo. org/ record/ 583096 We hereby confirm that all methods were carried out in accordance with relevant guidelines and regulations as publicly available data is used in the study. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.